using an automatic weighted keywords dictionary for intelligent web content filtering
نویسندگان
چکیده
filtering of web pages with inappropriate contents is one of the major issues in the field of intelligent network's security. having a good intelligent filtering method with high accuracy and speed is needed for any country in order to control users' access to the web. so, it has been considered by many researchers. presenting web pages in an understandable way by machines is one of the most important preprocessing steps. thus, offering a way to describe web pages with lower dimensions would be very effective, especially in determining the nature of web pages with respect to whether they should be filtered out or not. in this paper, we propose an automatic method to detect forbidden keywords from web pages. next, we define a new representation of web pages in vector form which consists of weighted sum and frequency of forbidden keywords in different parts of web pages named rwsf. for this, a ranking dictionary of keywords including forbidden keywords is used. to evaluate the proposed method, 2643 pages consisting of 1311 normal pages and 1332 forbidden pages were used. among these, 1851 pages were used to train the system and 792 pages were used for system evaluation. the system has been assessed using various classifiers such as: k-nearest neighbor, support vector machines, decision tree and artificial neural networks. evaluation results indicate the high efficiency and accuracy of the proposed method in all classifiers.
منابع مشابه
Automatic Web Rating: Filtering Obscene Content on the Web
We present a method to detect automatically pornographic content on the Web. Our method combines techniques from language engineering and image analysis within a machine-learning framework. Experimental results show that it achieves nearly perfect performance on a set of hard cases.
متن کاملAutomatic Keywords Extraction – a Basis for Content Recommendation
This paper describes a use case for an application that recommends learning objects for reuse and is integrated in the authoring environment. The recommendations are based on the automatic detection of content being authored and the context in which this resource is authored or used. The focus of the paper is automatic keyword extraction, evaluated as a starting point for content analysis. The ...
متن کاملAn Intelligent Content Filter based framework for Mobile Web Services
Since mobile and internet penetration is highly increasing all over the world and these technological advancements of mobile devices are changing every time, most of people are using their mobile devices for their day to day transactions, business etc., and access services from the internet. Today, lot of search engines are available, these search engines will dump information abundantly. Peopl...
متن کاملNamed Entity Recognition for Web Content Filtering
Effective Web content filtering is a necessity in educational and workplace environments, but current approaches are far from perfect. We discuss a model for text-based intelligent Web content filtering, in which shallow linguistic analysis plays a key role. In order to demonstrate how this model can be realized, we have developed a lexical Named Entity Recognition system, and used it to improv...
متن کاملQoS-based Web Service Recommendation using Popular-dependent Collaborative Filtering
Since, most of the organizations present their services electronically, the number of functionally-equivalent web services is increasing as well as the number of users that employ those web services. Consequently, plenty of information is generated by the users and the web services that lead to the users be in trouble in finding their appropriate web services. Therefore, it is required to provi...
متن کاملIntelligent Web Services System for automatic framework
Recently Web services have become a key technology which is indispensable for e-business due to its ability to provide the desired information or service regardless of time and place by integrating current application systems within a single business or between multiple businesses with standardized technologies using the open network and Internet. However, the current Web Services Retrieval Sys...
متن کاملمنابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
journal of advances in computer researchناشر: sari branch, islamic azad university
ISSN 2345-606X
دوره 6
شماره 1 2015
میزبانی شده توسط پلتفرم ابری doprax.com
copyright © 2015-2023